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Abstract 

This paper extends the cure rate model considered in Lopes 
and Bolfarine (2012) by taking into account random effects in 
both, the probability of cure and the survival function for 
individuals that are at risk. The model is parametrized in 
terms of the cured fraction which is then linked to covariates. 
The estimation is based on the restricted maximum 
likelihood (REML) approach proposed in McGilchrist and 
Yau (1995). Simulation studies are performed and results 
based on a real data set are presented indicating good 
performance of the proposed approach. 
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Introduction 

An implicit assumption with the ordinary survival 
models is that all individuals under study are 
susceptible to the event of interest, which is not always 
in the case given the improvement in disease 
treatments experienced in the last decades. For some 
types of cancer, for example, great improvement has 
increased the probability that an individual is 
considered with the disease under control (usually 
denoted by a cure). The amount of cured individuals 
after a treatment is typically known as the cure fraction. 
An indication of the presence of cured individuals in a 
specific treatment is given by the value that the 
Kaplan-Meier estimator takes at the maximum 
survival time. A formal test for the presence of a cure 
fraction using the Kaplan-Meier estimator can be seen 
in Mailer and Zhou (1996). 

Berkson and Gage (1952) developed a model that came 
to be known in the literature as the mixture model for 
cure fraction where there is a proportion p of 
susceptible individuals and, hence, a proportion 1 - p 
of cured individuals. This model has been studied by 



several authors including Goldman (1984), Sposto et al. 
(1992) and Mizoi et al. (2007), among others. An 
alternative route was pursued by Yakolev and 
Tsodikov (1996) and Chen et al. (1999). Their approach 
is based on the assumption that each individual has a 
fixed unobserved (latent) number N of factors, each 
capable of triggering the event of interest. This model, 
presented in Section 2, known in the literature as the 
promotion time cure rate model has been the subject of 
intense research activity. Further references can be 
found in Lopes and Bolfarine (2012). 

Some literature connecting and unifying both, the 
mixture and promotion time models are Yin and 
Ibrahim (2005) and Rodrigues et al. (2009) 

This paper aims to consider situations where the 
collected data come from several clusters and are 
obtained using repeated measures. For example, in 
multicenter clinical trials, patients from a given clinic 
(cluster) can be affected by unobserved variables 
specific to that particular center. Therefore, it seems 
appropriate to include in the model random effects to 
take into account the dependence generated in patients 
treated in a same clinic. Yau and Ng (2001), working 
with a mixture model incorporates random effects in 
both, the cure fraction (Vj ) and on the survival function 
of the susceptible individuals (Uj), assuming that the 
random effects are normally and independently 
distributed. Lai and Yau (2008) worked with the 
bivariate normal to (Uj,Vj) in the mixture model, 
assuming a possible dependence between the two 
random effects in a same clinic. However, either was 
developed a classical approach to parameter 
estimation. Lopes and Bolfarine (2012) incorporated 
random effects in the promotion time cure rate model 
only on the cure fraction and developed classical as 
well as Bayesian approaches for parameter estimation. 
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In this paper, the model in Lopes and Bolfarine (2012) 
is extended to develop a classical approach to the 
parameter estimation in the promotion time cure rate 
model incorporating random effects for both, the 
survival time of susceptible individuals and in the cure 
fraction. 

Section 2 presents the ordinary promotion time cure 
rate model. In Section 3, the model is extended, by 
incorporating random effects. In Section 4, the classical 
approach to parameter estimation is developed, based 
on the BLUP and REML estimation approaches. In 
Section 5, an application of the proposed model is 
considered for a real data set. In Section 6, a simulation 
study is presented to evaluate the proposed estimation 
procedures. Final considerations and discussions are 
presented in Section 7, where it is concluded that the 
proposed estimation approach is fairly satisfactory and 
that the model is a viable alternative to fit real data 
where it seems adequate to utilize random effects 
along with a cure fraction. 

The Promotion Time Cure Rate Model 
(PTCRM) 

The model proposed in Chen et al. (1999) can be 
defined in the following way. Suppose that an 
individual in the population has N carcinogenic cells at 
the time entering the study, where N ~ Po(9), 9 > 0. 
Moreover, W a is denoted as the random variable 
expressing the time the a-th cell needs to produce a 
detectable cancer. For non-cured subjects, there are 
cells such as N > , with W a , a = 1,2, ...,N, 
conditionally independent given N and identically 
distributed with common survival function F(t\ic) = 
l-S(t\K), where k is a set of unknown parameters, 
properly defined in some parameter space. For cured 
subjects, N = and it is assumed that P(W — °°) = 1. 
The distribution F is related to the susceptible 
individuals and, in general, it is a proper function in 
the sense that lim t ^ M F(t\K) = 1. Under this framework, 
the time for a cancer to be detected can be represented 
by a random variable T = mM$W a , < a < N}. In a 
word, for non-cured individuals, the failure time is the 
minimum among the times the cells may take to, 
eventually, produce a detectable cancer whereas cured 
individuals will never experience the event of interest 
and the failure time in this case is infinity. Under such 
conditions, it can be verified that the (improper) 
survival function for the random variable T, also called 
the population survival function, is given by 

S p {t) = e-*?™. (1) 
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Therefore, the cure fraction in the population is given 
by lim t ^ M S p (t) = e~ e > 0. Assuming right censoring, 
the actual data for the i-th subject may be represented 
by the random variables Y t = minCTi, Q) and 8 t — 
I(Ti < Q), where T t and Q are failure and censoring 
times, respectively, fort = 1, ...,n. The observed data 
will be denoted by D obs = (y,S), where y = (y v ...,y n ) 
and S = (Si, ...,S n ). It is noted that N = (jV 1; ... ,N n ) are 
non-observable, and the complete data will be denoted 
by D = (y, S, N) . Therefore, the complete likelihood 
function for (k, 9) can be written as 

L(k,9\D) = (n"=i S (yi I k) Ni ~ Si [JVj/(jj I k)] *' ) x 
ex P Sf =1 [N t log(0) - logOV; !)] - n9}. (2) 

The marginal likelihood with respect to N is [Chen et 
al. (1999)] 

L(K,9\D obs ) = 

nr=i0/(y») fii exp{-0(l -S(y»)}. (3) 

Consider now the particular case where the lifetimes 
W a , a = 1,2,..., have a Weibull distribution. Although 
other distributions could also be considered, the 
Weibull has a number of advantages: it is popular, 
easy to interpret and has the flexibility of 
accommodating different shapes for the hazard 
function, depending on the shape parameter a . 
Moreover, for the practical situation motivating the 
model, it is expected a > 1. 

For non-homogeneous populations, let x t = 
(x a , ... ,x lr ) T be a covariate vector associated with the 
lifetimes W a , a = 1,2,.... Following Yau and Ng (2001) 
is considered 

5(t|fi,K) = [S (t\K)r^i, i = l n, (4) 

where S (t|K) = exp{-At a ], k = (A,a) and ^ = xfy. 

As for the cure fraction, a vector of covariates z t = 
(z a , ... , z is ) T may be assumed. Consider 

9 t = exp(z[/?), i = 1, ...,n. 

The quantities y and /? are, respectively, r x 1 and 
s x 1 vectors of unknown parameters. The choice of 
covariates for each part depends on the practical 
problem being considered and on the information 
given by the physician. 

Note that the linear function x\y should exclude an 
intercept because if that is the case then, 

S(t\^,K) = [5 (t|K)] ex P^ = [S (t\K)] ex V^ + Z r k=l x ikYk] = 

[S (t|K*)] exp{a = 1 * Wt> '' t} , (5) 
which will lead to identifiability problems, as 
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mentioned in Yau and Ng (2001). 

Promotion Time Cure Rate Model with 
Random Effects 

Suppose rij patients grouped in clinic j,j = 1, The 
expanded notation is considered in the previous 
section by means of the proposition that Ny represents 
the number of carcinogenic cells associated with the i- 
th individual from the j-th clinic, with Ny ~Po(dy). For 
j = 1, ...,], define the vector Nj = (JV 1; ,N 2j , ... ,N n .jY . 
Let W ai j be the random variable corresponding to the 
time the a-th carcinogenic cell from the t-th individual 
within the j-th clinic needs to yield a detectable cancer. 
For a given subject, the random variables W aij , a = 
l,...,Ny, conditionally independent given Ny have a 
common survival S k). Hence, the time up to a 
detectable cancer can be defined as Ty = min[^W a y , 1 < 
a < Ny} for non-cured individuals and P(Ty = °°) = 1 
if Ny = 0. The survival function S(- \$y, k) is the same 
as in (4), with covariates Xy and Zy included in the 
model in the same way as discussed in the previous 
section. However, in order to take into account the 
effect of the y'-th clinic on survival for cured and 
uncured subjects, random effects (Uj,Vj),j = 1, are 
considered so that 

f y = xjj Y + Uj and By = exp{zjj P + Vj}, (6) 

where {U jt V s )~N 2 (fi 2 ,t),j = 1, , with 1 = 
diag((Tu,a^). For a given clinic j, high (low) values of 
Uj will lead to an increase (decrease) in the risk to 
activate a particular cancer cell; as a consequence 
subjects from that clinic will experience a greater 
(smaller) risk of developing a carcinogenic tumor. 
Similarly, low (high) values of V- indicate that patients 
treated in the )-th clinic have greater (smaller) chance 
of being cured. 

The model in Lopes and Bolfarine (2012) is a particular 
case of (6), when Vj ~N(0, er^),; = 1, ... J and er^ = 0. 

It should be mentioned here that the same non- 
identifiability issues discussed in Yau and Ng (2001) 
may arise, specially in trials with very long follow-up 
times, when it is difficult to distinguish between cure 
and censoring for long-term survivors. 

Given that a non-intercept model is considered, the 
random effects Uj,j = l,...,J will represent those 
quantities in this new formulation and need to be 
estimated. The solution proposed in Yau and Ng (2001) 
may be considered: the remaining parameters are 



estimated assuming the quantity 5 (t| k) is known. 
Define U = (U lt ...,Uj) T and V = (V v ... ,Vjf . The 
complete log-likelihood function for the model (after 
summing over N — (N v - ,Nj)) is given by I = ^ + l 2 , 
in which 

;=1 1=1 



h = (-1/2) 



+ (-1/2) 



/log(27ra u 2 ) + y U T u\ 

;iog(27r^ 2 ) + y v T v\, 



with F{t\$i,K) = 1 - [S (t|K)] ex P f< and S (t|ie) = 
exp{-/lt a }. 

Estimation 

In the previous section it was observed that the log- 
likelihood function / may be expressed as a sum of 
two quantities ^ and l 2 , where due to the unobserved 
quantities U and V, it can not be directly maximized. 
One possible way to deal with this problem is to 
consider the quadrature procedure discussed in Davis 
and Rabinowitz (1975). Typically, this approach 
requires high computation times since in the situation 
discussed here there are 2] random effects to deal with; 
moreover, there is the question of how many knots to 
consider, which can significantly affect the estimation 
procedure. 

Another alternative is to consider the restricted 
maximum likelihood (REML) approach proposed by 
McGilchrist and Yau (1995) as well as Yau and Ng 
(2001), Lai and Yau (2008) and Lopes and Bolfarine 
(2012). Initially, define < = (y,p,U,V). Assuming that 
S (t|jc) is known and the value of I is fixed, it is 
possible to find the BLUP estimator for £ using the 
Newton-Raphson algorithm [Henderson (1975)], 
including the predictors for U and V. Starting with £ , 
an initial value for £ , for £=1,2,..., the iterative 
procedure is based on 



(7) 



where Bis the negative of the the second derivative of I 
with respect to More details on this procedure are 
provided in Appendix A. Denote X as the estimator for 
£ given by (7) after convergence. Denote the 
partitioned matrix B and its inverse, evaluated at X 
according to the ordering y | B \ U \ V , respectively, as 
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B= 



( &11 B i2 B 13 B 14 \ 

B 21 B 2 2 B 23 B 24 

B31 B32 ^33 B34 

\b 41 B 42 B 43 B44J 



and 



/ A n A 



B 1 - 



12 



»13 



A21 A 22 A 23 A 24 



*3l 



-^32 -^33 
A & 



A34 
A 44 / 



\A 41 A 42 /i 43 

According to McGilchrist and Yau (1995) and Yau and 
Ng (2001), the REML estimate for I is then given by 

I = J^diag^rA^ + D^tr^ + V T V). 

Moreover, 



(8) 



\P) \A 2l A 22 ) 
Var(2) = (! 11 ! 12 \ 

\^21 ^22/ 



with 



£11 = au 4 Q ~ 2a~ 2 trA 33 ) + a~hv (3| 3 ), 
2 22 = oi" 4 (/ " 2ff- 2 tr 3 44 ) + a-hr (A 2 44 ) and 
^12 = a~ A a-\x (A 34 A 43 ). 

In order to deal with the unknown S (t\K), plug the 
REML estimator obtained for £ and I into I and denote 
the resulting profile likelihood for k by l p . Considering 
a bidimensional grid for K , values for X and a 
maximizing l p are obtained. The Newton-Raphson 
approach can also be used. Considering k a initial 
value for k, the Newton-Raphson algorithm is based 
on 



K f = K 



[-1 



+ CJ*(%)\ ,1=1,2, 



(9) 



where C is the negative of the second derivative of the 
log-likelihood l p with respect to k. Details of such 
procedure are given in Appendix B. As pointed out in 
Yau and Ng (2001), the asymptotic variance of the 
limiting value k in (9) is not given by the inverse of C, 
because the expected values of the partial derivatives 
with respect to A or a considering the other parameters 
as fixed are not zero. However, a resampling method 
as the jackknife [Miller (1974)] may be considered to 
compute the standard deviations for the estimator of k. 
After getting such estimates, it is necessary to restart 
the procedure to update the estimates for £ and E, with 
new updated values, compute the updated estimate 
for k, and so on. In summary, to find estimates for I 
and k, one has to iterate the procedures described by 
equations (7), (8) and (9) until convergence. 



Model diagnostics can be performed considering a 
graphical analysis based on the Cox-Snell residuals, 
defined as 



r i} = H^ij-.xp.Xij.Zij)^ = l,...,v j ;j = 1.....J. 



(10) 



where //(•; xp, x t j , z y ) is the cumulative risk function 
evaluated at the parameter estimates xp = (£, S, k) . If 
the model is correct, then r^, j = 1, i = 1, ...,n ; , 
should behave as a censored sample from an 
exponential distribution with mean one. A 
disadvantage of such procedure is that if the proposed 
model is not adequate, the Cox-Snell residuals will not 
give any information on the source of inadequacy. 

Application 

Kalbfleisch and Prentice (2002) have presented a data 
set related to a trial to compare treatments for 
oropharynx carcinoma. A total of 195 subjects, with 
lesions in three different sites is considered: faucial 
arch, tonsillar fossa and pharyngeal tongue. Each 
patient was enrolled in one of six participating clinics 
and randomly assigned to one of the treatments: 
radiation therapy and radiation therapy combined 
with chemotherapy. Information on some covariates 
(gender, tumor stage classification (T -stage), lymph 
node stage classification ( N -stage), among others) 
which can have influence on the survival of the 
patients are also available. The aim of the study is to 
identify risk factors for cancer and compare the 
treatments with respect to survival prognostic. Patients 
undergone the treatment for a period of 90 days; after 
that they received post-treatments. Since there were no 
restrictions, the clinics could use different approaches 
for the post-treatment, which could have some 
influence on patients survival times as well as on the 
cure rates. This also may lead to some correlation 
between observed quantities such as the survival times 
for patients belonging to same clinic. 

In Kalbfleisch and Prentice (2002) the clinics were 
considered as fixed effects. Consider now the 
proposed approach following the same lines as in Yau 
and Ng (2001) and Lopes and Bolfarine (2012), so that 
only patients with cancer located in the pharyngeal 
were considered, corresponding to a total of 66 
patients, out of which 47 died and 19 were censored 
(29%). Variable 7-stage measuring the tumor size is 
grouped in 4 categories: T lr T 2 and 7 3 which refer to 
primary tumor measurement and T 4 to massive 
invasive tumor. For the f-th individual in the ;-th 
clinic, define the 
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_ fl.ii 
Xi i to, i 



1, if tumor is classified as T , T? or T- 



3 ■ 



if tumor is classified as TV 



TABLE 2 BLUP PREDICTORS FOR THE RANDOM EFFECTS. 



Failure is defined as death due to the tongue or 
oropharynx carcinoma. Deaths related to other causes 
were considered as censored data. Previous analyses 
[Kalbfleisch and Prentice (2002), Yau and Ng (2001), 
Lai and Yau (2008) and Lopes and Bolfarine (2012)] 
agree that there is strong evidence of a cure fraction in 
this data set. The proposed model in Section 3 is 
considered with 

• x = (*«/) B . xl ' J = 1 6 ' 1 = 1 n J' n * = 

• Z = (l n *,X), where l n * is an-column vector of 
ones, 

• Y = Y> an d 

• n x = 9, n 2 = 14, n 3 = 16, n 4 = 7,n 5 = 11, n 6 = 
9,n* = 66. 

Note that the same set of covariates to model the cure 
fraction and survival of non-cured individuals has 
been considered. Additionally, the cure rate of Chen et 
al. (1999) (i.e., with no random effects) based on (1) 
and an ordinary Weibull survival model were also 
fitted, that is, neither one took into account the effect of 
clinic. 

Results of model fitting for the models described 
above are presented in Table 1. In addition, random 
effects estimates for each one of the clinics are shown 
in Table 2. 

TABLE 1 ML AND REML ESTIMATES FOR THE MODEL 
PARAMETERS (SE). 



Parameter 


PTCRM with random 


Independent 




effects (REML) 


PTCRM (ML) 


Y 


0.528 (0.207) 


0.422 (0.375) 


ft 


0.275 (0.039) 


0.240(0.244) 


ft 


0.441 (0.156) 


0.520(0.409) 


ot 


0.003 (0.028) 




°l 


0.029 (0.014) 




X 


0.00006 (0.00009) 


0.0025 (0.00056) 


a 


1.478 (0.281) 


1.472 (0.374) 




Independent 




Parameter 


Single Weibull 






(ML) 




Y 


0.573(0.177) 




ft 






ft 
°t 






°l 






X 


0.00016 (0.00014) 




a 


1.073 (0.134) 





Clinic 




«5 


1 


0.0020 


-0.0419 


2 


0.0010 


0.1043 


3 


-0.0096 


-0.0861 


4 


0.0025 


-0.0294 


5 


0.0020 


0.0498 


6 


0.0022 


0.0033 



The figures in the first row of Table 1 suggest that 
when the cure fraction is excluded from the analysis, 
tumor stage seems to have a significant effect on the 
survival of susceptible patients. On the other hand, 
when a cure fraction model without random effects is 
considered, tumor stage no longer seems to be 
significant. In contrast, the results presented in Yau 
and Ng (2001) and Lai and Yau (2008) considering the 
mixture model have suggested that this variable is 
significant. For the random effect model considered 
here (first column of the table), tumor stage seems also 
to be significant on the cure fraction. In addition, the 
values for random effects shown in Table 2 are close to 
zero and, probably, non-significant given the estimates 
of their variances. However, a formal test for the 
hypothesis H : = = to be developed to allow a 
more reliable assessment will be considered in a future 
work. 

Point estimates for the cure rates e~ e 'i for patients with 
small primary tumor (X = 0) and a massive tumor 
(X = 1) are presented in Table 3. Inspecting the values 
under an exploratory perspective, it seems that Clinic 3 
is the one performing the best among all the clinics, so 
that in average, the activation time of the cells for 
susceptible individuals is higher. The estimated hazard 
ratio is = 1.70, so the cells of non-cured subjects 
with a massive tumor have a 70% higher risk to cause 
death when compared to non-cured patients with 
tumors in other stages. 

TABLE 3 ESTIMATED CURE RATE FOR THE SIX CLINICS. 



Clinic 



Estimate cure rate 
Z = Z = 1 



1 


0.28 


0.14 


2 


0.23 


0.10 


3 


0.30 


0.15 


4 


0.28 


0.14 


5 


0.25 


0.12 


6 


0.27 


0.13 



Cox-Snell's residuals were computed to check model 
fitting, considering (10). The cumulative risk function 
for those residuals was computed and compared with 
the cumulative distribution for the unitary exponential 
distribution. As it can be depicted from Figure 1, the 
two cumulative risk functions are close, suggesting a 
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FIG. 1 CUMULATIVE HAZARD FUNCTIONS FOR COX-SNELL 
RESIDUALS AND EXPONENTIAL DISTRIBUTION (SOLID LINE). 

Simulation 

The results of a simulation study performed for the 
promotion time cure rate model with random effects 
have been reported so that results are comparable with 
the simulation study performed by Yau and Ng (2001). 
10 clinics each with 15 patients are taken into 
consideration. Patients are randomly assigned to the 
treatment group (x t j = 1) and the remaining patients 
are assigned to the control group (xy =0), for each 
clinic. Following the simulation in Yau and Ng (2001), 
the same covariate for the cure fraction with the 
addition of an intercept was in consideration as well, 
i.e., Zy = [1, x« ] . Values of JVy are simulated from the 
Poisson distribution with mean 0y . The values of 0y 
are given as in (6) and the Vj are generated from 
jV(0,er2). For cured individuals, that is, iVy = 0, the 
failure time is infinity and for non-cured individuals, 
random variables W X j, ... , W N j representing promotion 
times, with density given by 

/o(t,*y) = /i (t| K )exp!^kyy + Uj]S (.t\K)<*'*'vr +u J) 
have been generated, with Uj generated from the 
jV(0,(Tu) and the minimum values of these survival 
times considered to be failure times. If the survival 
time is greater than the censoring time C, then it is 
considered to be a censored observation. In the 
simulation study, 5 (t|K) is Weibull as in (4) is 
regarded, with parameters A and a. We set A = 0.01 
was set in all the simulations and 3 groups for 
parameters Y,Po,Pi were in consideration. For each 
group simulations were performed with {a = 1, cr„ = 
0.5, Oy = 0.5, C = 500}, and the effect of changes on 
a,a^,cTy and C was studied. For each combination, 500 
simulations were generated. 
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We expect that 

• As a varies from 1 to 1.3, the performance of 
the estimators is improved since the Wy will 
have smaller expectation making it more 
probable that minipVy , i = 1, . . . , Nu } is smaller 
than C allowing an easier identification on 
cured patients. 

• As the variances of the random effects grow 
from 0.5 to 1 and the censoring times is 
reduced from 500 to 300, estimators 
performance will be worse since the variances 
of the random effects are increased (and 
consequently the variability of the generated 
failure amplifies because the censoring time 
will increase in the group of susceptible 
individuals. 

TABLE 4 ESTIMATED BIASES FOR 500 SIMULATIONS OF 
REML ESTIMATION. 

Parameter True Average SE t SE 2 

value bias 
(a) Sim. Set I (X = 0.01, y = -0.5, ft, = 0,ft = 0.5) 

Sim. 1 (a = 1, al = 0.5, al = 0.5, C = 500) 



(b) 



Y 


-0.5 


0.003 


0.367 


0.403 


ft 


0.0 


-0.004 


0.265 


0.284 


ft 


0.5 


0.015 


0.299 


0.314 


°l 


0.5 


-0.022 


0.327 


0.397 




0.5 


0.034 


0.319 


0.331 




Sim. 2 (a = 1.3, <r„ = 


= 0.5, al = 


= 0.5, C = 500) 




Y 


-0.5 


0.005 


0.273 


0.290 


ft 


0.0 


0.032 


0.259 


0.273 


ft 


0.5 


-0.001 


0.244 


0.240 


°l 


0.5 


-0.006 


0.306 


0.322 




0.5 


0.010 


0.301 


0.312 




Sim. 3 (a = 1, 


= W = 


1,C = 500) 




Y 


-0.5 


0.003 


0.406 


0.443 


ft 


0.0 


0.004 


0.352 


0.344 


ft 


0.5 


0.011 


0.331 


0.335 


°l 


1.0 


-0.005 


0.600 


0.766 


°l 


1.0 


0.062 


0.594 


0.713 




Sim. 4 (a = 1, = 


0.5, <7„ 2 = 


0.5, C = 300) 




Y 


-0.5 


-0.036 


0.553 


0.565 


ft 


0.0 


0.016 


0.271 


0.265 


ft 


0.5 


0.056 


0.446 


0.426 


«2 


0.5 


0.030 


0.391 


0.458 




0.5 


0.051 


0.338 


0.376 


m. Set II (A = 0.01,y = " 


0.5, /?„ = 


-0.5, ft = 0.5) 






Sim. 5 (a = 1, = 


0.5, (Ty = 


0.5, C = 500) 




Y 


-0.5 


0.006 


0.383 


0.426 


ft 


-0.5 


0.019 


0.272 


0.277 


ft 


0.5 


-0.009 


0.308 


0.314 


°l 


0.5 


0.011 


0.360 


0.447 


°l 


0.5 


0.022 


0.326 


0.362 




Sim. 6 (a = 1.3, tr„ = 


= 0.5, al = 


= 0.5, C = 500) 




Y 


-0.5 


0.010 


0.275 


0.291 


ft 


-0.5 


0.029 


0.268 


0.276 


ft 


0.5 


-0.014 


0.253 


0.250 


°l 


0.5 


0.027 


0.333 


0.343 




0.5 


0.010 


0.313 


0.328 
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TABLE 4 ESTIMATED BIASES FOR 500 SIMULATIONS OF 
REML ESTIMATION (CONTINUATION). 



Parameter 


True 
value 


Average 
bias 


CP* 




(b) Sim. Set II (X 


= O.Ol.y = 


-0.5,/? = -0.5 


ft = 0.5) 




Sim 


7 (a = \,al = = 1, C 


= 500) 




Y 


-0.5 


0.012 


0.399 


0.438 


ft 


-0.5 


0.022 


0.354 


0.373 


ft 


0.5 


-0.020 


0.318 


0.322 


°l 


1.0 


0.030 


0.630 


0.847 


°l 


1.0 


0.021 


0.586 


0.686 


Sim. 8 (a = 1, cr 2 


= 0.5, (Jy = 


0.5, C = 300) 







Y -0.5 -0.059 0.618 0.618 

ft -0.5 0.003 0.278 0.270 

ft 0.5 0.051 0.484 0.441 

of 0.5 0.112 0.456 0.516 

(*l 0.5 0.028 0.344 0.388 



(c) Sim. Set III (A = 0.01,/ = -l,/?„ = = 1) 

Sim. 9 (a = 1, o- 2 = 0.5, (T 2 = 0.5, C = 500) 



Y 


-1.0 


-0.095 


0.753 


0.704 


ft 


-1.0 


0.018 


0.285 


0.288 


ft 


1.0 


0.101 


0.634 


0.533 




0.5 


0.080 


0.429 


0.530 




0.5 


-0.003 


0.336 


0.373 


Sim. 10 (a 


= 1.3, ol 


= 0.5, (T 2 = 0.5, C 


= 500) 




r 


-1.0 


0.019 


0.282 


0.278 


ft 


-1.0 


0.062 


0.285 


0.288 


ft 


1.0 


-0.006 


0.269 


0.274 




0.5 


-0.007 


0.333 


0.356 


<r 2 


0.5 


0.019 


0.329 


0.357 


Sim. 11 (a 


= l.ff 2 = 


1,ct 2 = 1,C = 500) 




r 


-1.0 


-0.033 


0.566 


0.614 


ft 


-1.0 


0.046 


0.364 


0.359 


ft 


1.0 


0.026 


0.454 


0.443 




1.0 


0.034 


0.669 


0.928 




1.0 


0.009 


0.602 


0.715 


Sim. 12 (a 


= 1.^ = 


0.5, (jy = 0.5, C = 


300) 




Y 


-1.0 


-0.120 


1.045 


0.785 


ft 


-1.0 


0.020 


0.289 


0.310 


ft 


1.0 


0.163 


0.870 


0.622 




0.5 


0.242 


0.566 


0.632 


T, 2 


0.5 


-0.020 


0.353 


0.410 



Simulation results are presented in Table 4. SE 1 and 
SE 2 represent, respectively, the mean of the standard 
deviations and the standard deviations of the estimates 
for the 500 simulated samples. The table reveals that 
the bias corresponding to the parameters related to the 
cure fraction, namely ft and ft, is small except for the 
case C = 300 for which the bias of ft is substantially 
increased, specially for parameter groups II and III. 
Concerning the standard deviation, it can be noticed 
that SE 1 is slightly smaller than SE 2 indicating that the 
standard deviations of ft and ft are slightly 
underestimated. One exception occurs in the 
parameter group III (C = 300) for which the standard 
deviation was considerably underestimated. 
Regarding y, it can be noticed that the bias is small 
except for group III and C = 300 . The standard 
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deviation of y is slightly underestimated, except for 
parameter group III (C = 300) for which the standard 
deviation is substantially overestimated. 

For the random effects it can be depicted that the bias 
is small, an exception being the case when a£ and 0% 
increase from 0.5 to 1.0 and C decreases from 500 to 
300, in which case the bias increases, although the bias 
of (Jy is always smaller than that of a 2 . The variances 
are slightly underestimated in all parameter groups, 
except in the case where t7„ and al increase and the 
censoring variable diminishes. It is also worth noticing 
that as a increases from 1.0 to 1.3, the performance 
(evaluated in terms of standard deviation) of all 
estimators improves. 

Final Discussion 

This paper extends the cure rate model in Lopes and 
Bolfarine (2012) by considering random effects in both, 
the cure probability and the survival function for 
individuals that are at risk. The random effects are 
considered to be normally distributed. An approach to 
model adequacy is developed defining Cox-Snell type 
residuals for the proposed model. The model is 
parametrized in terms of the cured fraction which is 
then linked to covariates. The approach used for 
parameter estimation is the restricted maximum 
likelihood (REML) approach proposed in McGilchrist 
and Yau (1995). Simulation studies are performed and 
results of a real data analysis are presented indicating 
good performance of the proposed approach. 
Moreover, residual analysis performed for the model 
considered to analyze the real data has indicated that 
the methodology considered in this paper is suitable to 
deal with real data sets of the considered type. An 
important case exclusive from consideration in this 
work is the situation where the random effects Uj and 
Vj are correlated, which could be the case in many 
practical applications. This generalization is the subject 
of a forthcoming paper. 

Appendix A: The BLUP Estimation 
Procedure 

The score and matrix B , computed based on the 
second derivative of I with respect to are as follows: 



X' 









- 





Z' 




\dh/df] 







G' 







dhld(f>. 




a~ 2 U 


.0 


G' 
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B 



+ 



X' 

z 

G' 
G' 





"u V 





]\X G 01 

'J l-o z o d 



-d\/dfdf' -d 2 lJd$d(f>'}\X G 01 
-dHjdQd? -d 2 ljd<t)d<t) 






a- 2 h 



with f = A"y + G'tf and <j> = Z'B + G'V . 1, is the 
identity matrix of dimension / and X, Z and G are the 
design matrices of y, B and the random effects U and V 
respectively. Let H (t\K) be the baseline cumulative 
hazard. The first and second partial derivatives of ^ 
with respect to the elements of f and </> are given as 
follows: 



= S i j(l-H {t i j\ K )e^j), 



-e^ + ^7/ (t iy |/c)[S (t y |ic)] 
= H (t iJ \K)e^[S i j 



d 2 li 

+ e ^[5 ( ti ,| K )] exp ^(l-H ( t ,| K )e^)L 

^-= 5 ,- e ^(i-[5 (t,| K )r^)- 

a 2 ^ 



deputy 

d 2 h 



= e^(l-[S (t iJ \K)r^), 

= e^^H (t ij \K)[S (t ij \K)r" i \ 



= 0, 



d 2 h 



= 0, 



— 0, for i ^ i or j ^ j ■ 



Appendix B: Estimation of Parameters of 
Baseline Hazard 

For a Weibull baseline hazard, the cumulative hazard 
is H (t\K)=At a . Let S i} = [S (t i} |/c)] exp ^ and 
tj = e' 1 ''' . The first and second partial derivatives of ^ 
with respect to the parameters of the baseline hazard 
and matrix are given as follows 

dl 1 

af = Z Z {*» ( r 1 - * f * C S ) - ef 9 W }' 



7=1 i = l 



^ = Z Z ^ + log ty " Aef 9 ^ log ty ) 

7=1 i=l 

-AV^Vogty}. 

y n ; 

-S=ZZ^( a_2+Aefiy ^ log2t ^ 

7=1 i=l 

+ AV f<,c i"V°g 2 t tf (l-Aef«tg)} f 

y n y 

5 2 l 



3adA 



^ 2, ^ tfj log t« fay + fy^y C 1 - ^ y 



7=1 1=1 



7=1 i=l 
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